AMRITA_CEN$@$SemEval-2015: Paraphrase Detection for Twitter using Unsupervised Feature Learning with Recursive Autoencoders

نویسندگان

  • Mahalakshmi Shanumuga Sundaram
  • Anand Kumar Madasamy
  • Soman Kotti Padannayil
چکیده

We explore using recursive autoencoders for SemEval 2015 Task 1: Paraphrase and Semantic Similarity in Twitter. Our paraphrase detection system makes use of phrase-structure parse tree embeddings that are then provided as input to a conventional supervised classification model. We achieve an F1 score of 0.45 on paraphrase identification and a Pearson correlation of 0.303 on computing semantic similarity.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection

Paraphrase detection is the task of examining two sentences and determining whether they have the same meaning. In order to obtain high accuracy on this task, thorough syntactic and semantic analysis of the two statements is needed. We introduce a method for paraphrase detection based on recursive autoencoders (RAE). Our unsupervised RAEs are based on a novel unfolding objective and learn featu...

متن کامل

Reddit Temporal N-gram Corpus and its Applications on Paraphrase and Semantic Similarity in Social Media using a Topic-based Latent Semantic Analysis

This paper introduces a new large-scale n-gram corpus that is created specifically from social media text. Two distinguishing characteristics of this corpus are its monthly temporal attribute and that it is created from 1.65 billion comments of user-generated text in Reddit. The usefulness of this corpus is exemplified and evaluated by a novel Topic-based Latent Semantic Analysis (TLSA) algorit...

متن کامل

A Paraphrase and Semantic Similarity Detection System for User Generated Short-Text Content on Microblogs

Existing systems deliver high accuracy and F1-scores for detecting paraphrase and semantic similarity on traditional clean-text corpus. For instance, on the clean-text Microsoft Paraphrase benchmark database, the existing systems attain an accuracy as high as 0.8596. However, existing systems for detecting paraphrases and semantic similarity on user-generated short-text content on microblogs su...

متن کامل

Ebiquity: Paraphrase and Semantic Similarity in Twitter using Skipgrams

We describe the system we developed to participate in SemEval 2015 Task 1, Paraphrase and Semantic Similarity in Twitter. We create similarity vectors from two-skip trigrams of preprocessed tweets and measure their semantic similarity using our UMBC-STS system. We submit two runs. The best result is ranked eleventh out of eighteen teams with F1 score of 0.599.

متن کامل

GTI: An Unsupervised Approach for Sentiment Analysis in Twitter

This paper presents the approach of the GTI Research Group to SemEval-2015 task 10 on Sentiment Analysis in Twitter, or more specifically, subtasks A (Contextual Polarity Disambiguation) and B (Message Polarity Classification). We followed an unsupervised dependency parsing-based approach using a sentiment lexicon, created by means of an automatic polarity expansion algorithm and Natural Langua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015